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1. INTRODUCTION 

Malware, or more precisely, malicious software, stands for software or program which infiltrates 
and damages computer systems with no prior consent to the system owner. Over the years, malware is 
indicated as a security threat by cyber security standards [1]. Malware includes viruses, Trojan horses, 
worms, exploits, botnets, and retroviruses [2]. Among those listed, the computer virus termed by Cohen [3], 
which is the most popular. Despite being an unharmful software that causes minor disturbances to computer 
users in the early days, the virus has been developed to reach a particular financial goal or even becomes a 
virtual weapon utilized in cyber war. 

Today, creating malware or anti-malware has been a billion-dollar industry. The never-ending war 
between malware and anti-malware is where cyber attackers implement new techniques to overcome 
malware detection as the defenders strive for effective measures to counter the attacks. The anti-malware 
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authors have implemented various types of heuristics to recognize new and unknown malware. On the other 
hand, their opponents constantly find different methods for attacking [4], [5]. 

The malware spreading has been a topic for numerous researches. Studies of its dynamic and 
behaviors when placed in real-world networks are among the main trends. For example, virus infection on 
computers was investigated by authors in [6] implementing epidemiologically compartmental models to 
identify potential contagious sources. Concurrently, another group [7], introduced a moderate 
epidemiological model inspired by the fractional epidemiological model for descriptions of computer viruses 
with an arbitrary order derivative and a non-singular kernel. Besides, authors in [8] proposed a novel virus 
heterogeneous propagation model and a tool to optimize its propagation. Lately, the combination of malware 
and machine learning (ML) and deep learning (DL) has been raised as a new research trend among scientists. 
Many researches focused on utilizing ML or DL techniques as in [9]-[16] to evade anti-malware system. 

Recently, we have witnessed a trend towards using bio-inspired techniques like swarm intelligence 
(SI) in various complex issues in general and cyber security in particular [17]. Frameworks for malware 
evolution based on evolutionary computation for malware were proposed in [18]. Furthermore, the 
researchers in [19] exploited the evolutionary algorithm (EA) to generate malware automatically. Kudo et al. 
[20], proposed botnets with adopted ML techniques for prediction of system vulnerabilities and malware self- 
governed evolution. Nevertheless, a search of the literature revealed few studies which address the 
combination of SI and other intelligence techniques in malware. Hence, we aim to explore the possibility of 
fusing SI and ML to develop a proof of concept malware, allowing us to identify the characteristics of this 
potential threat for developing mitigation methods in the future. Accordingly, we experiment with a malware 
prototype in a secured virtual environment with real-time observations, data recording, visualization, and 
some fundamental analysis. 

The approach described in this paper is an improvement of the swarm virus introduced in [21]-[23]. 
Thus, we propose fusing SI, an artificial neural network (ANN), and a classical computer virus to form a new 
kind of malware. This software can simulate a biological swarm system’s behavior, which usually does not 
have a dominant central communication point. Thus, a virus prototype, namely X-ware, with controllable and 
limited contagion, was created. Furthermore, we discuss the possible mitigation approaches based on this 
idea and its use in complex computer systems as the artificial intelligence (AI) supervisor (deus ex machina) 
for solving non-malware problems. Briefly, the significant contributions of this paper are as follows: i) we 
present a novel malware proof of concept, which is showed that swarm intelligence, ANN could be 
embedded in a virus; ii) we conduct experimentations and demonstrate how to capture and visualize the 
virus’s behavior and analyze the swarm malware activities using complex network analysis, iii) finally, we 
discuss possible mitigation and remediation ideas for future anti-malware technologies. 

This manuscript’s remaining proceeds as follows: section 2 presents the method for developing the 
X-ware with its structure and functionalities. After that, section 3 contains experimental results and 
discussion. Finally, section 4 concludes the paper. 


2. RESEARCH METHOD 

This section describes the method of developing a prototype of the hypothesis X-ware. We also 
show how to capture and visualize such malware's behavior when it walks through the operating system. The 
swarm virus prototype, designed here, mimics a swarm system behavior and follows the main idea of a 
swarm algorithm. Malware has evolved drastically since its early days, being an unharmful annoyer. Various 
techniques have been adopted to virus development, such as oligomorphism, metamorphism, polymorphism, 
encryption, armouring (armoured viruses) and obfuscation to bypass anti-malware. Many modern viruses 
such as Mirai, Lokibot and AZORult are controlled with the command and control (C&C) infrastructure 
method. Nevertheless, C&C has a weak spot of being immobilized when its control center is demolished. 

Acting as a malware researcher, to eliminate the weak spot of the botnet structure being its C&C 
center, the author group utilized swarm-based intelligence and ANN to a traditional virus to produce a new 
malware named X-ware. Technically, this virus consists of instances (many individuals) forming a swarm 
(population) that propagates in the computer file system and computer network. The swarm individuals 
communicate via specific communication channels (when shifting from host to host) and among themselves. 
The global data is stored inside each virus so that the swarm can be expected to perform decentralized 
behavior. Whenever there is a change in the swarm (e.g., the virus moves to another host), the information 
will be updated to every individual in the flock. Furthermore, when a member of the swarm is eliminated 
(removed by a user or deleted by antivirus software), another one will be regenerated to ensure that the 
population's number is constant. In addition, the ANN embedded on the prototype can be used as an 
intelligent center that keeps payload, triggers conditions to execute the payload on the aimed target. On the 
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other hand, the virus can simulate the working mechanism of an ANN to enhance robustness. Figure 1 
illustrates principle of underlying the proposed idea. 
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Figure 1. The main idea of the X-ware and its visualization 


2.1. Incorporate ANN and the virus 

This part discusses how to incorporate ANN and the proposed virus. In this study, the concept of a 
multi-layer perceptron (MLP) network has been utilized to enhance the malware. More specifically, we 
adopted MLP in two scenarios: i) each of the virus in the swarm will contain an MLP and ii) several 
individuals in the population will simulate the behavior of the MLP to perform their intended actions. 


2.1.1. Each virus comprises a whole ANN 

In this implementation, each individual in the swarm contained an MLP. The MLP implementation, 
in this scenario, is composed of three layers: input layer, hidden layer and output layer. The file's size was 
used as input to the MLP network's input layer, which contains a total of two neurons. The hidden layer 
consists of two neurons for network training, and an output layer consists of one neuron as it produces the 
result of whether to perform the malware tasks or not. In terms of activation function, there are a variety of 
methods that could be used for training. In this paper, for the purpose of functionality demonstration, the 
logistic sigmoid function is utilized. The back-propagation algorithm [24] is used to train the MLP in the 
proposed approach. The dataset used for training is collected from system files. After training the model, the 
optimized network weights are integrated into the virus. When the virus executes, these weights are used on 
the embedded MLP to make the computation. Then, the MLP makes the trigger conditions to perform the 
execution, or it can be responsible for other activities such as locating the target and executing payload on the 
right object. Viruses are trained to perform system searches for finding a suitable target. The ANN then 
generates signals for conducting a task. For the herein experiments, the task is to display a message. 
Additionally, the file size is utilized for target identification. Other attributes, such as system-level features, 
can be utilized. Subsequently, it is impossible to reverse the ANN to work out the target specifications. 
Malware can use an ANN model, which is a black box, instead of a traditional if-then command line to 
camouflage its trigger condition. This seems to be a much more effective camouflage technique than 
obfuscation. Technique wise, this complicates the deciphering jobs of anti-malware by hiding the target 
categories or triggering conditions. 


2.1.2. Virus act as a node of an ANN 

This kind of implementation uses several viruses in the swarm to act as input layers, hidden layers 
and output layers of an MLP. In this kind of approach, only some individuals in the swarm (black colored) 
act as nodes in the MLP network. More precisely, two individuals are utilized as input nodes to receive 
signals, three for hidden nodes and one for the output node. The MLP is also trained to obtain the optimized 
weights in the same manner as mentioned above. The difference is the weight allocated to individuals that act 
as nodes in the MLP network. In other words, each node comprises its weights. When the virus executes, the 
swarm simulates the working mechanism of an MLP network. More specifically, the signal values 
propagated from the inputs virus, through the connection to the hidden virus, and then onward through more 
connection to the output virus. Following this strategy, the virus's payload could be distributed, and it is 
extremely hard to reverse engineer the virus (reverse engineers must capture the whole swarm). 
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2.2. Visualizing X-ware behavior 

The previous literature [21] pointed out that the virus's movement in the swarm and the 
communication inside the swarm likely follow the complex network topology so that any swarm intelligence 
based techniques, which permit search over the graph, could be adopted to build the complex network. The 
principal idea is such that a vertex represents each individual, and edges between vertices should reflect 
dynamics in population, i.e., interactions between individuals. For instance, self-organizing migrating 
algorithm (SOMA) [25] could be used to form a complex structure. The SOMA algorithm consists of a leader 
drawing the entire population in each migration loop. Hence, the population of activated leaders shall be 
recorded like vertex, and the interactions between Leaders and individuals shall be recorded like edges. 


3. RESULTS AND DISCUSSION 
3.1. Results 

In the herein experiments, a swarm consisting of five individuals was created, which means there 
are five virus instances. In each experiment, an individual jumped from file to file (infected a file) twenty 
times in total. The fascinating behavior of the swarm was recorded and visualized. Furthermore, the system's 
virus behavior had been analyzed for different network attributes such as degree centrality, closeness 
centrality, betweenness centrality, and eigenvector centrality. The obtained statistical data is given in Table 1. 
Figure 2 presents the visualization of the networks in term of degree, betweenness, closeness, and 
eigenvector centrality. As can be seen from the graphs, there are multiple nodes that are increasing 
(distinguished by their size), emphasizing their prominence in the population. 


Table 1. Swarm virus network centralities 
Min Median Max 
Degree centrality 0.095 0.057 0.485 


Closeness centrality 0 0.178 1 
Betweenness centrality 0 0.02 0.162 
Eigenvector centrality 0 0.004 1 


Closeness d Eigenvector 


Figure 2. Centralities of the X-ware network, capturing its movement through host system 
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Figure 3 depict the histograms of observed network attributes. The empirical result shows that the 
more important a node is, the higher centrality it has. Subsequently, if a node has a higher centrality, then it 
has more probability of being visited. In other words, the important files have a higher centrality, which 
means they have a higher chance to be infected by the malware prototype. In contrast, less important files 
have lower centrality values and a lower chance of being visited by the virus. 
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Figure 3. Histogram of the network centralities 


As shown in the Figure 3, the closeness centrality is mainly distributed over the entire network. This 
conveys that the majority of the nodes are contributed evenly in the network. In this graph, the bigger nodes 
have higher betweenness centrality. Incidentally, the node which has the highest betweenness centrality also 
has the best fitness. One crucial aspect to be taken into account is that once the node for the best fitness 
changes, the betweenness centrality of the system alters as well. Furthermore, in the Figure 3, the colour 
coding is utilised to distinguish the centrality. Darker nodes are more central, and lighter nodes are less 
central. As shown in the Figure 3, some individuals are more influent than others in the swarm. 


3.2. Discussion 

Our empirical experiments show that the files with higher fitness also have higher centrality metrics, 
which means these files are more important than the others. They are distinguishable from others. In contrast, 
less critical files have lower centrality estimates. This means a random X-ware instance, which is moving 
through the system files or network, has a higher probability of infecting a file with the higher fitness files. 

Furthermore, Figure 3 indicates that there is a powerful inverse correlation between centrality 
measure values (for betweenness, degree, and eigenvector) and their frequencies, which lead to its being 
modelled as a power-law distribution. Additionally, these figures show that the major vertices have low 
centrality values, while just minor vertices have high centrality values. On the other hand, the distribution of 
closeness centrality follows the normal curve. The relationship between degree centrality and its frequency 
probability matches the curve, which may indicate that the network has a scale-free character. 

The essential factor is that files with higher fitness should be distinct. In our experiments show that 
most important files also have higher centrality. In figures, the nodes that represent important files are bigger 


X-ware: a proof of concept malware utilizing artificial intelligence (Thanh Cong Truong) 


1942 O ISSN: 2088-8708 


than the other. In contrast, less important should be smaller. Graphs\ref{fig: centralities} is showing the 
dependency between importances and centralities. In these graphs, the most important file is the biggest node 
in the figure. So it is the most influence node in the network as well. Furthermore, in the figure, the darker 
nodes indicate that they are more influential than the others. These results suggest that if we can identify the 
most centrality node in the network, then the virus's spreading rate could be decreased and potentially 
eradicated. 

The prototype version of X-ware is developed and observed in a controlled environment so that its 
behavior and data can be easily obtained. Nevertheless, in reality, identifying the self-replicating swarm 
structures is a difficult task. As our experiments verify, the X-ware has two significant features 1) it is moving 
over the hosts while keeping the constant of the population and ii) the communication among them. These 
features shall act as the critical criteria for identifying the to-be-created likewise prototypes. Hence, we 
suggest that the protection systems should not destroy them instantly but observe them and analyze their 
activities data as a whole in order to discover the activities of some subset of such malware that can be 
expected from the X-ware (i.e., movement, communication, trigger). Additionally, complex network 
visualization and analysis can help a lot. By applying the network analysis, we can discover the critical 
nodes, which play an essential role in the swarm network and thus can take corresponding actions based on 
the analysis. X-ware's ideas are fully applicable to designing future anti-malware solutions. For instance, we 
can create adaptive, autonomous AI agents that collaborate to achieve common tasks. Instead of getting 
guidance from a single, centralized AI model, agents will be smart and robust enough to communicate with 
each other and work together to achieve common goals. 

Agents will learn how to protect systems depend on what they inspect from their networks and local 
hosts. Furthermore, their strength is further enhanced by observations and behaviors learned across different 
industries and majors. Generally speaking, we will have a swarm of rapid response local Als that 
accommodate their environment while collaborating with each other, instead of one big AI system delivering 
decisions. This will improve organizations’ IT performance by saving resources and helping them avoid 
sharing confidential, potentially sensitive information through the cloud or other means. 


4. CONCLUSION 

In this paper, we presented X-ware, a proof of concept malware using ANN and SI, to study its 
features to form the anti-malware solution in the future. This research discussed the method to develop an 
X-ware prototype in which the ANN acts as an intelligent center that keeps payload, triggers conditions with 
no destructive payload and controllable contagion. Furthermore, there were practical experiments to 
visualize, measure, and analyze the X-ware behavior under the form of a complex network. Our research 
pointed out that its movement and internal communication follow the complex network topology. In addition, 
information was shared, updated, and used by all swarm members directly in the communication progress. 
The results yielded from this work offer a better understanding of the behavior of a possible new generation 
of malware in order to protect future computer technology. As this work has shown, the X-ware prototype is 
a swarm one in which all individual viruses can communicate amongst themselves as a warm in nature would 
do. This leads to another possibility of adopting the idea to other kinds of malware like worms or Trojans so 
that their activities can be more distributed and robust. Furthermore, the environment then is not only on the 
PCs but also on networks. In future work, we will focus on proposing it as the autonomous anti-malware 
technology (i.e. Deux ex Machina) in complex and large systems. 
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